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Abstract 


Latent Markov (LM) m odels rep resent an important class of models for the anal¬ 
ysis of longitudinal data ( Bart o lucc i et ah|, 20131 ). especially when response variables 
are categorical. These models have a great potential of application for the analysis 
of social, medical, and behavioral data as well as in other disciplines. We pro¬ 
pose the R package LMest, which is tailored to deal with these types of model. In 
particular, we consider a general framework for extended LM models by including 
individual covariates and by formulating a mixed approach to take into account ad¬ 
ditional dependence structures in the data. Such extensions lead to a very flexible 
class of models, which allows us to fit different types of longitudinal data. Model 
parameters are estimated through the expectation-maximization algorithm, based 
on the forward-backward recursions, which is implemented in the main functions of 
the package. The package also allows us to perform local and global decoding and 
to obtain standard errors for the parameter estimates. We illustrate its use and the 
most important features on the basis of examples involving applications in health 
and criminology. 


Keywords: expectation-maximization algorithm, forward-backward re¬ 
cursions, hidden Markov model 
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1 Introduction 


In this paper we illustrate the R package LMest V2.0, which replaces version VI.0, avail¬ 
able from http://CRAN. R-project. org/package=LMest , The new version extends VI.0 
in several ways and it is aimed at providing a collection of tools that can be used to 


estimate the class of Latent Markov (LM) models for longitudina. 


package can be seen as a companion to the book 


Bartolucci et al 


catego rical data. The 
(1201311 that is focused 


on these models; we will closely refer to this book for the theoretical and s ome p racti cal 
details. Additional insights are given in the discussion paper by Bartolucci ct al. ( 2014b ). 

LM models are designed to deal with univariate and multivariate longitudinal data 
based on the repeated observation of a panel of subjects across time. More in detail, LM 
models are specially tailored to study the evolution of an individual characteristic of in¬ 
terest that is not directly observable. This characteristic is represented b y a latent process 
following a Markov chain as in a Hidden Markov (HM) model (Z ucchi ni and MacD o nald . 


2009 ). These models also allow us to account for time-varying unobserved heterogeneity 


in addition to the effect of observable covariates on t 


The initial LM formulation introduced bvlWi ggins 


re response variables. 


(1973), see also Wiggins (119551 ). has 


been developed in several directions and in connection with applications in many fields, 
such as psychology, sociology, economics, and medicine. In particular, the basic LM model, 
relying on a homogenous Markov chain of first order, has several extensions based on pa- 
rameterizations that allow us to include hypotheses and constraints of interest. The most 


important extension is for the inclusion of individua 


bution of the latent process (IVermunt et al. 


1999 


covariates that m ay af fect the distri- 


Bartolucci et al. 


tional distribution of the response variables given this process (iBartolucci and Farcomeni . 


2007) or the condi 


20091 ). LM models may also be formulated to take into account additional dependence 


structures in the data. ' 

n particular, the basic LM 

model has been generalized to the 

mixed LM approach by 

van de Pol and 7 

.aneeheine 

(1990 

) and 

o the LM model with 

random effects ( 

Altman, 

2007; 

Maruotti, 

2011; 

Bartolucci et al.. 

2011 

). These formu- 


lations are related to extended models in which the param eters are all owed to va ry in 


different latent subpopulations; for a complete review see 


Bartolucci et al. 


(120131) and 
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Bartolucci et al. 


(2014b). 


As already mentioned, LM models are closely related to HM models (Zucchini and MacDonald, 


2009]), which are specific for time-series data. LM models are formulated quite similarly, 


but they are tailored for longitudinal data where many subjects are observed only at few 
occasions, typically no more than ten. Most importantly, differently from LM models, 
extensions of HM models to include covariates or for complex data structures are rarely 
applied because these structures are typical of longitudinal studies. 


In R there are some packages that can handle models that are rela ted to 


In particular , HM models can be fit throu gh packages HM M (Himmelmannj, 


M models. 


20101), Hid- 


denMarkov flHartel. 120121 ) , and depmixS4 flVisser and Speekenbrinkl . 120101) . The last one 
is the most closely related to LMest as it is tailored to deal with HM models based on a 
generalized linear formulation, which can be specified conditionally on individual covari¬ 
ates. depmixS4 is nonetheless designed for re peated measurements on a sin gle unit, as 


in a time-seri es c o ntext . Packages mhsmm (O'Connell and Hojsgaan 


(Bulla and Bullal . 
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2011) and hsmm 

2013) deal with hidden semi-Markov models, which are HM models in 


which a se parate as sumptions are specified for persistency in a latent state. Finally, pack¬ 
age msm flJackson! 20111) can deal with HM and related models evolving in continuous 
time. A commercial software which can be used to fit certain LM models is Latent Gold 


flVermnnt and Magidson! 120031) . Additionally, certain Matlab toolbox e s are also available 


for this aim; see for example the HMM toolbox proposed by iMu r p hv (119981) . 

The distinguishing features of the LMest package with respect to the ones above are 
the following: 


• it is primarily designed to work with longitudinal data, that is, with (even many) 
i.i.d. replicates of (usually short) sequences; 

• it can deal with univariate and multivariate categorical outcomes; 

• standard errors for the parameter estimates are obtained by exact computation or 
through reliable approximations of the observed information matrix; 

• additional random effects can be used to build mixed LM models; 
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• individual covariates are included through efficient parameterizations of either the 
measurement or the latent model; 

• computationally efficient algorithms are implemented for predictions; 

• the package is flexible and it minimizes the use of time resources by relying on 
certain Fortran routines. 

The paper is organized as follows. Section [2] briefly outlines the general formulation 
of LM models and their maximum likelihood estimation, together with a brief description 
of the basic LM model. Section [3] describes the use of the LMest package to estimate 
LM models with individual covariates included in the measurement model, while in Sec¬ 
tion 0 we deal with individual covariates affecting the distribution of the latent process. 
Both formulations are illustrated through examples based on real data about self-reported 
health status. In Section [5] we introduce the mixed LM models and we describe the use 
of the R function aimed at estimating this model formulation. Here, the illustrative ex¬ 
ample is based on data in the field of criminology. Finally, Section [6] contains the main 
conclusions. 


2 Latent Markov models for longitudinal data 


In the following we provide a brief review of the statistical methodology on which the 
LM models are based, trying to reduce the mathematical deta ils as much as possible. 


The illustration closely follows the recent paper by 


Bartolucci et ah 


fl2014bl ). We focus 


in particular on the maximum likelihood est imation of these models on the basis of the 
Expectation-Maximization (EM) algorithm (Dempster et ah, 1977). We also deal with 
more advanced topics which are important for the correct use of the LMest pack age, su ch 
as the selection of the number of latent states and local and global decoding flViterbil . 


19671: 


J ua ng and Rabiner. 1991). 


2.1 The general latent Markov model formulation 

Consider the multivariate case where we observe a vector of r categorical response 
variables, Y^\ with Cj categories, labeled from 0 to Cj — 1, j = 1,... ,r, which are available 
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at the f-th time occasion, t = 1 ,,T. Also let Y be the vector obtained by stacking Y® 
for t = 1,..., T; this vector has then rT elements. Obviously, in the univariate case we 
have a single response variable Y® for each time occasion, and Y is made by T elements. 
When available, we also denote by the vector of individual covariates at the f-th 
time occasion and by X the vector of all the individual covariates which is obtained by 
stacking the vectors X^\ for t — 1,..., T. 

The general formulation of the model assumes the existence of a latent process, denoted 
by U — (U^\ ..., U^), which affects the distribution of the response variables. Such 
a process is assumed to follow a first-order Markov chain with state space {l,...,fc}, 
where k is the number of latent states. Under the local independence assumption, the 
response vectors Y^\... ,Y^ are assumed to be conditionally independent given the 
latent process U. Moreover, the elements Y ( - l} within Y^\ t = 1 ,,T, are conditionally 
independent given fWk This assumption leads to a strong simplification of the model, 
but it can be relaxed by allowing serial dependence through the inclusion of the lagged 


response variables among covariates (see 


Bartolucci and Farcomeni, 


2009 


among others). 


Parameters of the measurement model are the conditional response probabilities 


4U = p( Y f = y\ u{t) = x[t) = *)’ i = i’ • • • > r , v = o, • • ■, c ; -1, 


which reduce to 


rfL = P(Y {t) = v\U {t) = u, X (t) = *), y = 0,..., c - 1, 


in the univariate case, where t — 1,..., T and u — 1,..., k. 
Parameters of the latent process are the initial probabilities 


7T, 


u\X 


= piU^ = u\X (l) = x), u — 1,..., k, 


and the transition probabilities 

^%x = P( u{t) = u \U {t ~ 1] = u, X (t) = x), t = 2,..., T, u, u = 1,..., k. 

In the expressions above, x denotes a realization of X^\ y denotes a realization of 
Yj l \ u denotes a realization of U^\ and u denotes a realization of Ijk~ l ). 
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On the basis of the above parameters, the conditional distribution of U given X may 
be expressed as 

P(U = u\X = x) = vr u (i)| a; (i) 'K u ( t ')\u( t - 1 ')x ( t '>> 

t> 1 

where u = (w^,..., u^) and x denotes a realization of the vector of all response variables 
X. Moreover, the conditional distribution of Y given U and X may be expressed as 

p(Y = y\U=u,X=x)=\{^, M , )xm , 

t. 

where, in general, we define <Pyl uX = p{Y i - t) = y\U^ = u,X {t) = x) and, due to the 
assumption of local independence, we have 

^y\ux = II ^jvAux- 

j 

Note that in the above expressions y is a realization of Y made by the subvectors y^ = 
(yi \ • • •, yr) whereas y is a realization of Y^ with elements yj, j = 1 ,..., r. 

When we deal with the inclusion of individual covariates, the manifest distribution of 
the response variables corresponds to the conditional distribution of Y given X , defined 
as 


p(y\x) = p(Y = y\X = x) = ^ ^uW\xm^% lu (i) x ( 2) ■ ■ ■ 


U 


X ^yW\uWxW ' ' '0ym\ u (T)x( T )- ( 1 ) 

In the basic version of the model, all assumptions are not formulated conditionally 


on individual covariates. Moreover, when such covariates are available, we suggest to 
avoid that they simultaneously affect both the distribution of the latent process and the 
conditional distribution of the response variables given this process. In fact, the two 
formulations have different interpretations, as explained in more detail in the following, 
and the resulting model would be of difficult interpretation and estimation. 

Finally, it is important to note that computing p(y\x), or p(y) in the basic version, 
involves a sum extended to all the possible k T configurations of the vector u ; this typ¬ 


ically requires a considerable computational effort. In order to efficient 
a_pro 


aability we can use a forward recursion (Baum et al. 


1970|); see 


y compute such 


Bartolucci et al. 


(12013i ) for details. 
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2.2 Maximum likelihood estimation 


Given a sample of n independent units that provide the response vectors y 1 ,..., y nl and 
given the corresponding vectors of covariates X \,..., x n , the model log-likelihood assumes 
the following expression 

n 


W = ^^ogpiy^Xi). 


1=1 


Note that in this case each vector y i is a realization of Y * which, in the multivariate case, 
is made up of the subvectors yf\ t = 1,..., T, that, in turn, have elements yfl , j = 
1,... ,r. Moreover, Xi may be decomposed into the time-specific subvectors of covariates 
x\ ,..., x\ . Also note that pipy^Xi) corresponds to the manifest probability of the 
responses provided by subject i, given the covariates; see equation (JTJ) . Moreover, 0 is the 
vector of all free parameters affecting piyy^Xi). 


The above log-like 


1970; 


Dempster et al. 


rood function can be maximized by the EM algorithm flRanm et al 


1977), as described in the following section. 


2.2.1 Expectation-Maximization algorithm 

The EM algorithm is based on the complete data log-likelihood that, with multivariate 
categorical data, has the following expression 

£ *( 0 ) = a fnxy + l0 § ^ 1 ® 

j =1 t= 1 u=l X y =0 u =1 X 

+ Yl Yl J2 J2 b ^x , ( 2 ) 

t =2 u= 1 u= 1 X 

where, with reference to occasion t and covariate conhguration x, a^ Xy is the number of 
subjects that are in the latent state u and provide response y to response variable j, b^ x 
is the frequency for latent state u, and b\^ uX is the number of transitions from state u to 
state u. 

The EM algorithm alternates the following two steps until convergence: 

• E-step: it consists of computing the posterior (given the observed data) expected 
value of each indicator variable involved in (J2]) by suitable forward-backward re- 
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cursions ( Baum et ajj, 119701 ); these expected values are denoted by a^ x , b^x, and 


u uuXl 

M-step: it consists of maximizing the complete data log-likelihood expressed as in 
(J2]), with each indicator variable substituted by the corresponding expected value. 
How to maximize this function depends on the specific formulation of the model and, 
in particular, on whether the covariates are included in the measurement model or 
in the latent model. 


We refer the reader to 


Bartolucci et al 


(2013)) for a detailed description of these steps, 


which are implemented in suitable functions of the LMest package. 

The EM algorithm could converge to a mode of the log-likelihood which does not cor¬ 
respond to the global maximum. In this regard, we suggest to combine a deterministic 
and a random starting rule for the EM algorithm, so to take as final estimate the one 
corresponding to the highest log-likelihood obtained at convergence of the algorithm; the 
corresponding estimate is denoted by 6. This may prevent problems due to the multi¬ 
modality of this function. In particular, the random initialization is based on suitably 
normalized random numbers drawn from a uniform distribution from 0 to 1 for the initial 
and transition probabilities of the Markov chain and for the conditional response prob¬ 
abilities. The convergence of the EM algorithm is checked on the basis of the relative 
log-likelihood difference, that is, 


(£(6>) Cs) — £(0) (s - 1) )/|£(0) (s) | < e, 


(3) 


where 0 ^ is the parameter estimate obtained at the end of the s-th M-step and e is a 
predefined tolerance level which can be set by the users. 

2.2.2 Standard errors 

After the model is estimated, standard errors may be obtained on the basis of the observed 
information matrix, denoted by J(0 ), by standard rules. In particular, they are computed 
through the squared root of the diagonal elements of the inverse of this matrix. The 
LMest package computes this matrix, an d then provides the standard err ors, by using 
either the numerical method proposed by iBartolucci and Farcomenil (120091 1 or the exact 











Bartolucci and Farcomeni (2014b), according to the 


computation method proposed by 
model of interest. 

According to the approximated method, J{0) is obtained as minus the numerical 
derivative of the score vector s(0) at convergence. The score vector, in turn, is obtained 
as the first derivative of the expected value of the complete data log-likelihood, which is 


based on the expected freq uencies , and 6 b,!^ c omputed with t he final parameter 


m 




estimates 6 ; for details see 




Bartolucci et al. 


(2013) and 


Penn on il (20.1 


J). 


The exact computation of J(0) is instead based on the Oakes’ identity flOakesl . 1999j) 


This method uses the complete data information matrix, produced by the EM algorithm, 
and a correction matrix computed on the basis of the first derivative of the posterior 
probabilities obtained from the backward-forward recursions. 

For the basic LM model and for the model with individual covariates affecting the 


distribution of the latent process, t 


ie_LMest packag e also provides functions aimed at 


performing a parametric bootstrap (Davison and Hinklcv, ll997f) in order to obtain stan¬ 
dard errors. 


2.3 Criteria for selecting the number of latent states 


In certain applications, the number of latent states, k , can be a priori defined, as in the 
univariate case in which the number of latent states may be fixed equal to the number 


of response categories. Otherwise, the following criteria are typical 


number of latent states: Akaike information criterion (AIC) of 


Akaike 


y used to select the 


information criterion (BIC) of Schwarz (11978 1. They are based on the indices 


(119731 1 and Bayesian 


AIC = —2 1 + 2 np, 

BIC = —21 + log(n) np, 


where i denotes the maximum of the log-likelihood of the model of interest and np denotes 
the number of its free parameters. 

According to the above criteria, the optimal number of latent states is the one cor¬ 
responding to the minimum value of the above indices; this model represents the best 
compromise between goodness-of-fit and complexity. However, other criteria may be 
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used, such a s thos e ta king into account 


Bacci et al. 


(2014) and 


Bartolucci et al. 


he qu ality of the classification; for reviews see 


(120144 • 


2.4 Local and global decoding 

The LMest package allows us to perform decoding, that is, prediction of the sequence of 
the latent states for a certain sample unit on the basis of the data observed for this unit. 

In particular, the EM algorithm directly provides the estimated posterior probabilities 
of U^\ that is, plU^ = u\X = x, Y — y), for t — 1,..., T, u — 1,..., k, and for every 
covariate and response configuration ( x , y) observed at least once. These probabilities 
can be directly maximized to obtain a prediction of the latent state of each subject at each 
time occasion t; this is the so-called local decoding. It shall be noted that local decoding 
minimizes the classification error at each time occasion, but may yield sub-optimal or 
even inconsistent predicted sequences of U^\ ..., U 

In order to track the latent state of a subject over time, the most a posteriori likely 
sequence of states must b e obtain e d, thr ough the so-called gl obal decoding , which i s base d 


on an adaptation of the 


Viterbil (119671) algorithm; see also lJuang and Rabinerl (Il99ll) . 


The algorithm proceeds through a forward-backward recursion of a complexity similar to 
the recursions adopted for maximum likelihood estimation within the EM algorithm, so 
that global decoding may be applied even for long sequences of data. 


2.5 Basic latent Markov model 


Following 


Bartolucci et ah 


(i2013m . and as already mentioned above, the basic LM model 


rules out individual covariates and assumes that the conditional response probabilities 
are time homogenous. In symbols, we have that 4>y\ uX = 4> y \ u in the univariate case and 
<t>f y \ux = ^jyW in tlie multivariate case; we also have ir u \ x = n u and 7r^- x = 

In order to fit this model, we can use function est_lm_basic, which is one of the main 
functions in the LMest package. Here, we skip a detailed ill ustrat ion of this function, as 


it is already provided in the Appendix of the book 


Bartolucci et ah 


(201^), and we prefer 


to focus on functions to estimate more sophisticated LM versions which are only recently 
available in the package. These versions include individual covariates and further latent 
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variables to cluster subjects in different subpopnlations (mixed LM model). 


3 Covariates in the measurement model 


When the individual covariates are included in the measurement model, the conditional 
distribution of the response variables given the latent states may be parameterized by 
generalized logits. In such a situation, the latent variables account for the unobserved 
heterogeneity, that is, the heterogeneity between subjects that we cannot explain on the 
basis of the observable covariates. The advantage with respect to a standard random- 
effects or latent class model with covariates is that the effect of the latent variables for 

'or a further discussion see 


Bartolucci and Farcomeni 

(2009 

) and 

' - -/ 7 

Pennoni and Vittadini 

(2013) 


3.1 Assumptions 


In dealing with univariate data in which each response variable has an ordinal nature as 
in the following illustrative example, we denote the number of its response categories by c. 
By retaining the assumption of local independence, and given the nature of the response 
variable, we rely on a parameterization based on global logits of the following type: 


p(Y® > y\U® = u,X (t) = x ) 
° g p(y(t) < y\u(t) = u, X® = X) 


log 


(/)(*) -I- -I- (//*) 

Vy\uX ^ • W ip c -i\ uX 

(W) I jit) 

^UX T ■ ■ ■ ■+■ <Py-l\ uX 


Hy + Ot u + x' (3, 


(4) 


with t = 1 ,,T, y = 1,..., c — 1. Note that these logits reduce to standard logits in 
the case of binary variables, that is, when c = 2. In the above expression, \i y are cut- 
points, a u are the support points corresponding to each latent state, and /3 is the vector 
of regression parameters for the covariates. 

As mentioned in Section I2TT1 the inclusion of individual covariates in the measurement 
model is typically combined with the constraints that n u \ x = n u and x = i t u \ Ui t = 
1,..., T, u, u — 1,..., k, in order to avoid interpretability problems of the resulting model. 
Also note that, under these constraints, the transition probabilities are assumed to be 
time-homogeneous so as to reduce the number of free parameters. 
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3.2 Application to the health related longitudinal data 

The LM model with individual covariates in the measurement model, and then affecting 
the conditional distribution of the response variables given the latent process, is estimated 
by the function est_lm_cov_manifest available in the LMest package. In this section, we 
illustrate the use of this function on the basis of an application to data derived form 
the Health and Retirement Study (HRS) conducted by the University of Michigan^. The 
data concern aspects related to retirement and health among elderly individuals in the 
USA. The sampling design is nationally representative of the population over age 50 years, 
whereas the response variable is the self-reported health status (named SHLT) and it is 
measured on a scale based on five ordered categories: ‘poor’, ‘fair’, ‘good’, ‘very good’, 
‘excellent’. The sample includes n = 7074 individuals followed at T = 8 approximately 
equally spaced occasions. Since each wave is added two years after the previous one, 
unobserved factors which vary during the course of the study may potentially affect the 
health status. The LM model with covariates directly takes this issue into account. 

For every subject, the available covariates are: gender, race, education, and age. The 
data matrix consists of a number of rows equal to the number of subjects, n = 7074, and 
19 columns as follows: 

library ( ’ LMest’ ) 


data(data_SRHS) 
data = data_SRHS 
data = as.matrix (data) 
head (data) 


## 

## [ 1 ,] 
## [ 2 ,] 
## [3,] 
## [4,] 
## [5,] 
## [ 6 ,] 


R1SHLT R2SHLT R3SHLT R4SHLT R5SHLT R6SHLT R7SHLT R8SHLT RAGENDER 


4 

3 

1 

3 

1 

3 


4 

3 

1 

3 

1 

2 


3 

3 
2 

4 
1 
2 


3 

3 

3 

4 
1 
2 


4 

3 

3 

3 

1 

2 


3 

3 

2 

1 

1 

2 


3 

3 

3 

3 

1 

2 


3 

3 

4 

5 
2 
2 


1 

2 

1 

2 

2 

2 


^or more details on the study see http://hrsonline.isr.umich.edu/ 


12 




## 

RARACEM 

RAEDUC 

R1AGEY.E 

R2AGEY.E 

R3AGEY_E 

R4AGEY.E 

R5AGEY.E 

R6AGEY_E 

## [1,] 

1 

3 

56 

58 

60 

62 

64 

66 

## [2,] 

1 

5 

54 

55 

57 

59 

61 

63 

## [3,] 

1 

3 

53 

55 

57 

58 

60 

62 

## [4,] 

1 

5 

36 

38 

40 

42 

44 

46 

## [5,] 

1 

3 

46 

48 

50 

52 

54 

56 

## [6,] 

1 

4 

44 

47 

48 

50 

52 

54 


## R7AGEY_E R8AGEY_E 

## [1,] 68 70 

## [2,] 65 67 

## [3,] 64 66 

## [4,] 48 50 

## [5,] 58 60 

## [6,] 56 58 

We observe that the first eight columns are related to the responses concerning the 
categorical response variable observed at each time occasion, with c = 5 and r — 1. The 
ordered categories are originally coded from 1 to 5. The remaining columns are related 
to the covariates. In particular, gender is coded as 1 for male and 2 for female, whereas 
race has three categories coded as 1 for white, 2 for black, and 3 for others. Finally, 
educational level is represented by five ordered categories coded as 1 for high school, 2 for 
general educational diploma, 3 for high school graduate, 4 for some college, 5 for college 
and above. The last eight columns are related to age recorded at each time occasion. 

The function est_lm_cov_manifest is used to estimate the LM model at issue; it requires 
the following main arguments as input: 

• S: design array for the response configurations (of dimension n x T x r); 

• X: array of covariates; 


• lev: number of response categories of each outcome; 

• k: number of latent states; 

• mod: type of model to be estimated, coded as mod=0 for the model illustrated in 
Section 1X11 based on parameterization (jJD- In such a context, the latent process is 
of first order with initial probabilities which coincide with those of the stationary 
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distribution of the chain. When mod=l the function estimates a model relying on 
the assumption that the latent process has a distribution given by a mixture of 
AR(1) processes with common variance a 2 and specific correlation coefficients p u . 


This model strictly foll ows t he one proposed by 


Bartolucci et ah 


(j2014aj); see also 


Peuuoui and Vittadinil (1201311 for a comparison between the two types of model and 
the function help for further details. 

q: number of support points of the AR structure. When mod=0 it has to be set 
equal to 1; 

tol: tolerance level for the convergence, which corresponds to e in definition (J3J)• 
The default value is 10e-8; 


• maxit: maximum number of iterations before convergence. The default value is 

1000 ; 

• start: equal to 0 for deterministic starting values (default value) and to 1 for random 
starting values; 

• output: equal to TRUE to print additional output. FALSE is the default option; 


• out_se: equal to TRUE to calculate the information matrix and the standard errors. 
FALSE is the default option. 


Function est_lm_cov_manifest has other arguments for which we refer to the help. 

The matrix of the responses S is referred to n = 7074 subjects observed at T = 8 
time occasions with respect to r = 1 response variable. The responses are coded from 
0 (“poor”) to 4 (“excellent”), after modifying the original coding. For example, for the 
data at hand, the individual with id == 5 provides responses “good” or “excellent” at 
each time occasion: 


S = 5-data[, 1 :TT] 
head(S) 

## R1SHLT R2SHLT R3SHLT R4SHLT R5SHLT R6SHLT R7SHLT R8SHLT 
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## [ 1 ,] 
## [ 2 ,] 
## [3,] 
## [4,] 
## [5,] 
## [ 6 ,] 


112 
2 2 2 

4 4 3 

2 2 1 

4 4 4 

2 3 3 


2 

2 

2 

1 

4 

3 


1 

2 

2 

2 

4 

3 


2 2 2 

2 2 2 

3 2 1 

4 2 0 

4 4 3 

3 3 3 


In this application, we consider the covariates gender (included by a dummy variable 
equal to 1 for females), race (by a dummy variable equal to 1 for non-white people), 
educational level (by two dummy variables: the first equal to 1 for some college education 
and the second equal to 1 for college education and above), age (scaled by 50), and age 
squared (again scaled by 50). For example, the covariates configuration of individual with 
id == 3994, at occasion t — 1, is: 


X[, ,3994,1] 


## 

## [ 1 ,] 
## [ 2 ,] 
## [3,] 
## [4,] 


[, 1 ] 

1 

1 

1 

1 


[, 2 ] 
0 
0 
0 
0 


[, 3] 
0 
0 
0 
0 


[,4] 

0 

0 

0 

0 


[, 5] 
1 
1 
1 
1 


[, 6 ] 
1 
1 
1 
1 


The above subject is a female, white (first and second columns), with high school 
diploma (third and fourth columns), begin 50 years old at the first interview (fifth column). 
The last column is the square of age minus 50. Note that an alternative parameterization, 
which may result in a simpler interpretation of the parameter estimates, is formulated 
by considering as covariates the baseline age and the time since the beginning of the 
longitudinal study. 

In order to obtain estimates for the data reported above, we use the following command 
in R: 


est_lm_cov_raanifest(S, X, lev =5, k = 10, q = 1, mod = 0, tol = 10e-08, 

start = 1, output = TRUE, out_se = TRUE) 


In such an application, we note that when k is large the log-likelihood presents several 
local maxima. In order to prevent the problem of the multimodality of the log-likelihood 
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we adopt both deterministic (by using start=0) and random (start=l) initializations of 
the EM algorithm. In particular, the random initialization is performed on the basis of 
a suitable number of starting values proportional to k. Then, we run the above code 
with a number of states k from 1 to 11. For each k, we consider the largest value of the 
log-likelihood at convergence and we compute the BIC so as to choose the number of 
latent states. For the data at hand, the model which results to have the minimum BIC 
is that with A: = 10 latent states. 

Note that the option out_se = TRUE is used for obtaining the standard errors for the 
parameter estimates, by means of the numerical method described in Section 12.21 so as 
to evaluate the effect of the covariates on the probability of responding with a certain 
category. 

In the following we show the results obtained for the model with k = 10 latent states, 
by using the print function, which also returns the main convergence info: 

print (final_newlO) 

## Call: 

## est_lm_cov_manifest(S = S, X = X, lev =5, k = 10, q = 1, mod = 0, 

## tol = le-08, start = 1, output = TRUE, out_se = TRUE) 

## 

## Convergence info: 

## LogLik np AIC BIC 

## [1,] -62579 109 125376 126124 

We also report the results of the summary output for the chosen LM model: 

summary (final_newl0) 

## Call: 

## est_lm_cov_manifest(S = S, X = X, lev =5, k= 10, q = 1, mod = 0, 

## tol = le-08, start = 1, output = TRUE, out_se = TRUE) 

## 

## Coefficients: 

## 

## Vector of outpoints: 

## [1] 8.284 4.543 0.747 -3.573 
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Support points for the latent states: 

[1] -12.179 -7.279 -4.472 -2.279 -1.179 

[9] 5.309 7.532 


-0.158 2.176 4.451 


## 

## 

## 

## 

## 

## Estimate of the vector of regression parameters: 

## [1] -0.226 -1.424 1.534 2.643 -0.124 0.000 

## 

Vector of initial probabilities: 

[1] 0.2219 0.0932 0.0465 0.0644 0.0256 0.0094 0.2180 0.0220 0.2849 0.0142 


## 

## 

## 

## 

## 

## 

## 

## 

## 

## 

## 

## 

## 

## 


Transition matrix: 

[,1] [,2] [,3] [,4] [,5] [, 6] C,7] [,8] [,9] 
[1,] 0.8965 0.0040 0.0111 0.0257 0.0000 0.0019 0.0083 0.0056 0.0384 
[2,] 0.0006 0.8356 0.0000 0.0033 0.0299 0.0146 0.1150 0.0000 0.0010 
[3,] 0.0371 0.0000 0.7998 0.0641 0.0000 0.0000 0.0003 0.0745 0.0103 
[4,] 0.0439 0.0056 0.0618 0.7700 0.0000 0.0031 0.0261 0.0393 0.0272 
[5,] 0.0001 0.1474 0.0000 0.0002 0.8308 0.0171 0.0042 0.0000 0.0001 
[6,] 0.0250 0.1922 0.0001 0.0507 0.0792 0.5893 0.0582 0.0000 0.0050 
[7,] 0.0011 0.0244 0.0000 0.0024 0.0034 0.0047 0.8691 0.0000 0.0688 
[8,] 0.1073 0.0000 0.1046 0.0998 0.0000 0.0000 0.0001 0.6591 0.0184 
[9,] 0.0483 0.0099 0.0004 0.0052 0.0002 0.0014 0.0358 0.0004 0.8911 


## [10,] 0.1247 0.0266 0.0317 0.0789 0.0000 0.0000 0.2421 0.0098 0.3337 

## [, 10 ] 

## [1,] 0.0084 

## [ 2 ,] 0.0000 

## [3,] 0.0138 

## [4,] 0.0229 

## [5,] 0.0000 

## [6,] 0.0004 

## [7,] 0.0262 

## [8,] 0.0105 

## [9,] 0.0074 

## [10,] 0.1524 
## 

## Standard errors for the regression parameters: 

## [1] 0.0762 0.1039 0.1061 0.0925 0.0068 0.0002 


The above output displays the estimated cut-points ft y , the estimated support points 
a u , and the estimated vector of regression parameters (3 as in expression (J4]) . The esti- 


17 


mated coefficients in (3 are reported in the same order adopted to define the matrix X of 
covariates. The list of objects returned by the function, contained in finaLnewlO, may also 
be displayed in the usual way; for a complete list of the arguments returned as output we 
refer to the package help. For example, the standard errors for the estimated regression 
coefficients may be obtained as 

round (final_newlO$sebe, 3) 

## [1] 0.076 0.104 0.106 0.093 0.007 0.000 


On the basis of the above results, we can evaluate the effect of the covariates on the 
probability of reporting a certain level of the health status. In particular, women tend to 
report worse health status than men (the odds ratio for females versus males is equal to 
exp(—0.226) = 0.798), whereas white individuals have a higher probability of reporting 
a good health status with respect to non-white ones (the odds ratio for non-white versus 
white is equal to exp(—1.424) = 0.241). We also observe that better educated individuals 
tend to have a higher opinion about their health status. Finally, the effect of age is 
negative and its trend is linear due to the fact that the quadratic term of age is not 
significant. 

In this example, the time-varying random effects are used to account for the unobserved 
heterogeneity and interpretation of the latent distribution is not of primary interest. 
The fact that the optimal number of states is k — 10 tells us that there is unobserved 
heterogeneity, that is, that self-reported health status cannot be completely explained 


merely by the few covariates we 
the ordinal longitudinal data see 


r ave used. For another application of the LM model to 


Penn on i and Vittadini (2013]). 


4 Covariates in the latent model 

When the covariates are included in the latent model, we suppose that the response 
variables measure and depend on an individual characteristic of interest (e.g. the quality 
of life), which may evolve over time and is not directly observable. In such a case, the 
main research interest is in modeling the effect of covariates on the latent trait distribution 
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(Bartolucci et al. 


2009). This is illustrated in the following. 


4.1 Assumptions 


A natural way to allow the initial and transition probabilities of the latent Markov chain 
to depend on individual covariates is by adopting a multinomial logit parameterization as 
follows: 


, p(UM =u\X {1) = x) 

log —---- 

p([/(i) = l|X (1) = x) 

^ p(u® = u\U= u, X® = X) 
° g Z u\U I*" 1 ) = u, X w = a;) 


log 1 - (3ou + X f3 lu , u 

IV 

to 

(5) 

7Tl|® 



7T ( ? 



log — 70uu + x 'huut 

t > 2 , 

u 7 ^ u( 6 ) 


u\uX 


In the above expressions, f3 u = ((3 0u , (3' lu )' and ~f uv = ( 70 uuiliuu)' are parameter vectors 
to be estimated which are collected in the matrices /3 and T. 

For a more parsimonious model, instead of using (J 6 ]) we can rely on the following 
parameterization for the transition probabilities, that is, a multinomial logit based on the 
difference between two sets of parameters: 


p(U® = u\U= u, A® = x) 
log —7777-^777777-“777 T\ -7 = lo § “77^ = 7o„ 


p(U® =h|f/d- 1 ) = u,X^ =x) 


7T (t) 

n u\uX 


+ *'(715-7 1 J, (7) 


with t > 2 and u 7 ^ u and where / y 11 = 0 to ensure model identihability. The parameteri¬ 
zation used for modeling the initial probabilities is again based on standard multinomial 
logits, as defined in ([5]). 

As already mentioned, when the covariates affect the distribution of the latent process, 
these covariates are typically excluded from the measurement model by means of the 
constraint 4$ uX = <fi y \ u in the univariate case or 4^y\ uX = fijyW in the multivariate case. 
In this case, we rely on the assumption of time homogeneity for the conditional response 
probabilities which are also independent of the covariates. 

Both parameterizations above, based on ([HD and (171) . are implemented in the R function 
estJm_cov_latent, which allows us to estimate the resulting LM models as we describe in 
detail below. 
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4.2 Application to the health related longitudinal data 

To illustrate function est_lm_covJatent we consider the same data as in Section 13.21 In 
such a context, an interesting research question is about the relationship between the 
self-reported health status and the covariates. When the latter are included in the latent 
model, the initial and transition probabilities are estimated for each configuration of the 
covariates and this may be useful to identify clusters of subjects related to specific needs. 
The R function is based on the following main arguments as input: 

• S: array of available configurations (of dimension n x T x r); 

• XI: matrix of covariates affecting the initial probabilities; 

• X2: array of covariates affecting the transition probabilities; 

• yv: vector of frequencies of the available covariate and response configurations; 

• k: number of latent states; 

• start: equal to 0 for deterministic starting values (default), to 1 for random starting 
values, and to 2 to define initial values as input of the function; 

• tol: tolerance level for the convergence. The default value is 10e-8; 

• maxit: maximum number of iterations before convergence. The default value is 

1000; 

• pa ram: type of parameterization for the transition probabilities, coded as pa ram = 
” multilogit” (default) to allow for the model parameterization defined in (JUD and as 
param = ” difFlogit” to allow for the parameterization defined in (171) : 

• Psi, Be, Ga: initial values of the matrix of the conditional response probabilities 
and of the parameters affecting the logits for the initial and transition probabilities 
when start = 2; 

• output: equal to TRUE to print additional output. FALSE is the default option; 
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• out_se: equal to TRUE to compute the information matrix and standard errors. 
FALSE is the default option; 

• fixPsi: equal to TRUE if the matrix of conditional response probabilities is given in 
input and it is not updated anymore. FALSE is the default option. 

The data matrix is specified as in Section 13.21 and the R function requires the array 
S of available response configurations, again corresponding to n — 7074 subjects, T = 8 
time occasions, and with r = 1 response variable. In this case, the vector of frequencies yv 
is an all-ones vector of length n. As an example, the individual with id==3994 provides 
the following responses: 

S [3994,] 

## R1SHLT R2SHLT R3SHLT R4SHLT R5SHLT R6SHLT R7SHLT R8SHLT 
## 33433334 

The model is specified through the design matrices for the initial and transition prob¬ 
abilities. More in detail, it is necessary to generate the design matrix XI containing the 
covariates affecting the initial probabilities of the latent process. In the present appli¬ 
cation, we have n = 7074 subjects and 6 covariates, defined for the first time occasion 
as: 


head(Xl) 


[,1] [,2] [,3] C, 4] [, 5] [, 6] 


## 

## [1,] 
## [ 2 ,] 
## [3,] 
## [4,] 
## [5,] 
## [ 6 ,] 


0 0 
1 0 
0 0 
1 0 
1 0 
1 0 


0 0 
0 1 
0 0 
0 1 
0 0 
1 0 


6 0.36 
4 0.16 
3 0.09 
-14 1.96 
-4 0.16 
-6 0.36 


Note that the covariates are coded as in the previous example in Section 13.21 (the 
square of age is divided by 100). Accordingly, for t — 2,..., T, the design matrix for the 
covariates affecting the transition probabilities of the latent process, X2, is generated by 
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considering n = 7074 subjects, T — 1 = 7 time occasions, and 6 covariates. For example, 
for the same individual as above with id==3994 the matrix is defined as 


X2 [3994, ,] 


## 


[,1] 

[, 2] 

[, 3] 

[,4] 

[, 5] 

[,6] 

## 

[1,] 

1 

0 

0 

0 

3 

0.09 

## 

[2,] 

1 

0 

0 

0 

5 

0.25 

## 

[3,] 

1 

0 

0 

0 

7 

0.49 

## 

[4,] 

1 

0 

0 

0 

9 

0.81 

## 

[5,] 

1 

0 

0 

0 

11 

1.21 

## 

[6,] 

1 

0 

0 

0 

13 

1.69 

## 

[7,] 

1 

0 

0 

0 

15 

2.25 


From the matrix above, we observe that the subject is a white female with high school 
diploma, who was 53 years old at the second interview. 

Within this illustrative example, we fit the model defined in Section Phil with a number 
of latent states equal to the number of response categories, that is, k — 5, by using the 
following command in R: 


res_5 = est_lm_cov_latent (S = S, XI = XI, X2 = X2, k = 5, start = 0, 
param = "multilogit", output = TRUE) 

The results can be displayed using the functions print and summary. The print function 
provides information about log-likelihood, AIC, and BIC: 


print (res_5) 

## Call: 

## est_lm_cov_latent(S = S, XI = XI, X2 = X2, k = 5, start = 0, 
## param = "multilogit", fort = TRUE, output = TRUE) 

## 

## Convergence info: 

## LogLik np AIC BIC 

## [1,] -62427 188 125229 126520 


The summary function displays the main objects: 
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summary (res_5) 


## Call: 

## est_lm_cov_latent(S = S, XI = XI, X2 = X2, k = 5, start = 0, 
## param = "multilogit", fort = TRUE, output = TRUE) 

## 


## 

Coefficients: 



## 






## 

Be - 

Parameters affecting the 

logit for the ini 

## 



logit 



## 



2 3 

4 

5 

## 

intercept 

0.736261 1.71894 

1.60361 

1.61932 

## 

Xll 


-0.004668 -0.29976 

-0.10427 

-0.23419 

## 

X12 


0.020255 -0.37203 

-1.14359 

-1.43806 

## 

X13 


0.449647 1.12496 

1.49687 

1.76936 

## 

X14 


0.291721 1.70277 

2.48762 

3.00167 

## 

X15 


-0.035662 -0.03377 

-0.04263 

-0.07273 

## 

X16 


0.316996 0.21178 

0.20712 

0.24153 

## 






## 

Ga - 

Parameters affecting the 

logit for the tra: 

## 

, , logit = 

1 



## 






## 



logit 



## 



2 3 

4 

5 

## 

intercept 

-3.4506 -30.3104 -7 

.32596 

-0.7017 

## 

X21 


-0.1625 -10.1897 -1 

.01965 • 

-6.8384 

## 

X22 


-0.2798 -8.4566 1 

.11113 -: 

10.7134 

## 

X23 


-0.1829 0.3931 -8 

.20077 • 

-1.3429 

## 

X24 


0.7692 2.3875 -7 

.43413 • 

-9.0080 

## 

X25 


0.3068 1.8111 0 

.13874 • 

-0.4990 

## 

X26 


-1.4401 -2.9503 -0 

.03292 

0.9238 

## 






## 

, , logit = 

2 



## 






## 



logit 



## 



2 3 

4 

5 

## 

intercept 

-3.06862 -2.17493 - 

16.3128 • 

-14.54901 

## 

X21 


0.25111 -0.17557 

-1.4647 

-0.44729 

## 

X22 


-0.30635 -0.69504 

0.5331 

9.38099 

## 

X23 


0.04224 0.54140 

-8.4763 

-6.88290 

## 

X24 


0.28653 -0.01384 

-2.5061 

-6.96501 
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## 

X25 

0.02297 

-0.04573 1.7420 0.02022 

## 

X26 

-0.06965 

0.06330 -5.9457 0.09913 

## 




## 

, , logit = 

3 


## 




## 


logit 


## 


2 

3 4 5 

## 

intercept 

-4.61886 

-2.00810 -3.438446 -4.03956 

## 

X21 

-0.45067 

-0.26236 -0.190092 -1.89834 

## 

X22 

0.26960 

-0.06015 -0.003967 2.71185 

## 

X23 

-1.79197 

-0.22210 -0.635222 -0.58401 

## 

X24 

-0.50820 

-0.62816 -1.030221 -1.71889 

## 

X25 

-0.03976 

-0.03865 0.022347 -0.08331 

## 

X26 

0.32190 

0.16203 0.007955 -0.18049 

## 




## 

, , logit = 

4 


## 




## 


logit 


## 


2 

3 4 5 

## 

intercept 

-6.99631 

-6.00784 -2.166489 -2.965711 

## 

X21 

-0.55850 

0.21417 -0.165402 -0.756391 

## 

X22 

0.85805 

0.92151 0.390139 0.007904 

## 

X23 

0.85650 

-0.48585 -0.206565 -0.396467 

## 

X24 

-1.09054 

0.02140 -0.443345 -1.582536 

## 

X25 

0.06578 

-0.04761 0.007509 -0.088671 

## 

X26 

0.13126 

0.61353 0.028437 0.311526 

## 




## 

, , logit = 

5 


## 




## 


logit 


## 


2 

3 4 5 

## 

intercept 

-16.217 - 

-2.24574 -2.21537 -1.35550 

## 

X21 

1.429 - 

-2.30980 -0.75273 -0.20117 

## 

X22 

3.383 

0.72911 2.04474 0.19405 

## 

X23 

-6.884 - 

-1.98413 -1.56324 -0.07730 

## 

X24 

-6.674 - 

-4.19165 -1.69996 -0.38578 

## 

X25 

1.678 

0.03181 -0.03745 -0.01392 

## 

X26 

-8.260 - 

-1.19747 0.03455 0.05452 

## 




## 
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## Psi - Conditional response probabilities: 
## , , item = 1 
## 


## 

## category 


state 


1 


2 


3 


4 


5 


## 

## 

## 

## 

## 


0 0.698132 0.059749 0.004321 0.001702 0.0004021 

1 0.266964 0.682778 0.084389 0.008304 0.0013713 

2 0.025484 0.220297 0.713140 0.145421 0.0313998 

3 0.008256 0.028767 0.181283 0.756656 0.1871161 

4 0.001165 0.008409 0.016866 0.087918 0.7797108 


On the basis of the estimated conditional response probabilities displayed above, we 
observe that the latent states are ordered: subjects classified in the first latent state have 
higher probability of reporting the worst health status, whereas subjects classified in the 
last state have a higher probability of reporting the best health status. 

The parameter matrix Be has large and positive intercepts, indicating a general ten¬ 
dency to start in good health status. Gender log-odds (second row of Be) are all negative, 
indicating that females report worse health status than males at the first occasion. The 
fourth and five rows give large positive parameters, indicating that better education leads 
to better health. Finally, the negative parameters for age (sixth row) indicate that elder 
subjects generally start into a worse health status. 

By using the option output=TRUE the function also returns some additional outputs. 
In particular, it is possible to obtain the estimated initial probability matrix, Piv, and 
the estimated transition probabilities matrix, PI. Accordingly, it may be of interest to 
compute the average initial and transition probability matrices for a group of subjects of 
interest. For example, if we consider white females with high school diploma, we obtain 
the corresponding average estimated initial and transition probabilities by means of the 
following commands: 

indl = (data[,9]==2 & data[,10]==l & data[, 11] ==1) 
pivl = roundfcolMeans (res_5$Piv[indl,]) ,4) 
print (pivl) 


## 1 2 3 4 5 

## 0.0705 0.1438 0.2736 0.2841 0.2280 
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PI1 = round(apply (res_5$PI[,,indl,2:TT] ,c(l,2) ,mean) ,4) 
print (PI1) 

## state 

## state 12345 

## 1 0.9150 0.0819 0.0000 0.0016 0.0015 

## 2 0.0611 0.8811 0.0556 0.0023 0.0000 

## 3 0.0067 0.0781 0.8827 0.0316 0.0008 

## 4 0.0016 0.0076 0.0989 0.8782 0.0136 

## 5 0.0005 0.0029 0.0286 0.1598 0.8082 


In a similar way, it is possible to compute the average initial and transition probabilities 
for non-white females with high educational level: 


ind2 = (data[,9]==2 & data[,10]>=2 & data[, 11] ==1) 
piv2 = round(colMeans(res_5$Piv[ind2,]) ,4) 
print (piv2) 

## 1 2 3 4 5 

## 0.1286 0.2652 0.3424 0.1647 0.0990 


PI2 = round(apply (res_5$PI[,,ind2,2:TT] ,c(l,2) ,mean) ,4) 
print (PI2) 

## state 

## state 12345 

## 1 0.9311 0.0641 0.0000 0.0047 0.0000 

## 2 0.0468 0.9148 0.0289 0.0042 0.0053 

## 3 0.0085 0.0727 0.8755 0.0313 0.0119 

## 4 0.0035 0.0146 0.1381 0.8306 0.0132 

## 5 0.0121 0.0045 0.1768 0.1559 0.6507 


From the above results we observe that, at the beginning of the period of observation, 
white females have a probability of around 0.50 of being in the last two states correspond¬ 
ing to very good or excellent health conditions, whereas non-white females have a higher 
probability of being in the first three states, corresponding to worse health conditions. 
Moreover, the estimated transition probability matrices show a quite high persistence in 
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the same latent state for both groups of individuals. However, the lower diagonal ele¬ 
ments of the transition matrix related to non-white females are higher than those related 
to white females, except for some transitions, showing a worse perception of their health 
status. 

4.2.1 Decoding 

Local and global decoding are implemented in the R function decoding, which allows us to 
predict the sequence of latent states for a certain sample unit on the basis of the output 
of the main functions for estimation in the package. Function decoding requires certain 
arguments in input: 

• est: output from the main functions for estimation (i.e., est_lm_basic, estJrrucovJatent, 
estJm_cov_manifest, or estJmjnixed illustrated in the following); 

• Y: vector or matrix of responses; 

• XI : matrix of covariates affecting the initial probabilities (for est_lm_covJatent) or 
affecting the distribution of the responses (for estJim_cov_manifest); 

• X2 : array of covariates affecting the transition probabilities (for estJrmcovJatent). 

For the application above, the most likely sequence of latent states for all sample units 
may be obtained by the following command: 

out_5 = decoding(res_5,S,Xl,X2) 


head (out_5$Ug) 








## 

[,1] [, 2] 

[, 3] 

[>4] 

[, 5] 

[, 6] 

[>7] 

[,8] 

## [1,1 

2 

2 

3 

3 

3 

3 

3 

3 

## [2,] 

3 

3 

3 

3 

3 

3 

3 

3 

## [3,] 

5 

5 

4 

3 

3 

3 

3 

3 

## [4,] 

3 

3 

2 

2 

2 

2 

2 

2 

## [5,] 

5 

5 

5 

5 

5 

5 

5 

5 

## [6,] 

4 

4 

4 

4 

4 

4 

4 

4 


For instance, we conclude that for the first subject there is only a transition, at the 
third time occasion, from the second to the third latent state. 
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5 Mixed latent Markov model 


Another relevant extension of LM models may be formulated to take into account ad¬ 
ditional sources of (time-fixed) depe ndence in the data. In this pap er, we provide an 
illustration of the mixed LM model (Ivan de Pol and Langeheinel . Il990l) in which the pa¬ 
rameters of the latent process are allowed to vary in different latent subpopulations defined 
by an additional discrete latent variable. 


5.1 Assumptions 

Let U be a discrete latent variable, which is time invariant. The latent process is now 
denoted by V = (V^ 1 ),..., V <T> ) which substitutes the symbol U used in the previous 
sections. In such a context, the variables in V follow a first order Markov chain only 
conditionally on U. This additional latent variable has k\ support points (corresponding 
to latent classes) and mass probabilities denoted by X u , u = 1,..., k\. Accordingly, we 
denote by ki the number of latent states, corresponding to the number of support points 
of every latent variable V^\ t — 1,..., T. 

Note that, under this approach, which may be useful from a perspective of clustering, 
the initial and transition probabilities of the latent Markov chain differ between sample 
units in a way that does not depend on the observable covariates. 

In such a context, parameters to be estimated are the conditional response probabili¬ 
ties, which may be formulated as 

</>jy\v = P( Y j t] = y\V W =v), t = l,...,T, y = 0,..., cj - 1, v = 1,... ,k 2 , 

the initial probabilities 

n v \u = p(V {1) = v\U = u), u = 1,..., ki, v = 1,..., k 2 , 
and the transition probabilities 

ir v \uv = p{V® = v\U = u, V ( * _1) = v), u = 1,..., k u v, v = 1,... ,k 2 . 

The model here illustrated relies on the assumption that the conditional response 
probabilities and the transition probabilities are time-homogeneous. Obviously, this for- 





mulation may be extended by also inc 


previous sections; see 


Bartolucci et al. 


uding observable covariates as illustrated in the 


(2013) for a description. 


The manifest distribution of Y may be obtained by extending the rules given in Section 
12.51 In particular, the conditional distribution of V given U may be formulated as 


p(V — V \U — u) — TT vW \ u 


t> 1 


where v = (V 1 ),..., v^) denotes a realization of V. Given the assumption of local 
independence that is maintained under this model, the conditional distribution of Y 
given U and V reduces to 

p(Y = y\U = u,V = v ) =p(Y = y\V = v ) = JJ (j> y (.t)\ v w = nil^V*)’ 

t t j 3 

whereas the conditional distribution of Y given U is expressed as 

P(X = V\U = U) = y^7r„(i)| u 7r„(2) |m,(i) • • ■ n v (T)\ uv (T-l)<f)y(i)\ v p) . . .4>y(X) |„(T). 

V 

Finally, the manifest distribution of Y is now obtained by the following sum 

ki 

p(y) = p(Y = y) = ^p(Y = y\U = u)X u , 


u— 1 


which depends on the mass probabilities for the distribution of the latent variab 


in this case p(y) may be computed through a forward recursion ( Baum e t al. 


e U. Even 


1970). 


Referred to the maximum likelihood estimation of the mixed LM model formulated 
above, we can extend the procedure illustrated in Section 12.21 where the complete data 
log-likelihood has the following expression: 

r ( 0 ) = 

j =1 t= 1 v=l y =0 


ki 

E 

U= 1 


k,2 T k,2 k2 

b i™ lo § n v\ u + b log 

^=1 t =2 v=l v=l 


ki 


+ ^2 C u log X u . 


U= 1 


In the previous expression, a p!y is the number of sample units that are in latent state 
v at occasion t and provide response y to variable j. Moreover, with reference to latent 
class u and occasion t, bul is number of sample units in latent state v, and bf^ v is the 
number of transitions from state v to state v. Finally, c u is the overall number of sample 
units that are in latent class u. 
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5.2 Application example on longitudinal data from criminology 


The mixed LM model is illustrated by using a simula ted d a 


which strictly follows the one analy zed in 


(120 ioh . 


Bartolucci et al. 


Bartolucci et al. 


.aset (for reasons of priv acy) 


(120071 ); see also 


Francis et al. 


020131 ). and iPe nno ni (120141) . The data concern the complete con¬ 


viction histories of a cohort of offenders followed from the age of criminal responsibility, 10 
years. The offense code has been reduced to 73 major offenses and they have been grouped 
according to the Research Development and Statistics Directorate (1998 on the basis of 
the following main typologies: ‘violence against the person’, ‘sexual offenses’, ‘burglary, 
robbery’, ‘theft and handling stolen goods’, ‘fraud and forgery’, ‘criminal damage’, ‘drug 
offenses’, ‘motoring offenses’, and ‘other offenses’. 

For the simulated data, we consider n = 10000 subjects (including the proportion of 
non-offenders): 4800 females and 5200 males. We also consider T — 6 age bands with a 
five years window (10-15, 16-20, 21-25, 26-30, 31-35, and 36-40 years) and r = 10 binary 
response variables. For each age band, every response variable is equal to 1 if the subject 
has been convicted for a crime of the corresponding offense group and to 0 otherwise. 


Then, the data matrix, reported below in 
of the same parameter estimates obtained in 


ong format, has been simulated on the basis 


Bartolucci et al. 


(120071 ). 


data(data_criminal_sim) 

data = as.matrix (data criminal sim) 


head (data) 




## 

id 

sex 

time 

yi y 2 

## Cl ,] 

1 

1 

1 

0 0 

## [2,] 

1 

1 

2 

0 0 

## [3,] 

1 

1 

3 

0 0 

## [4,] 

1 

1 

4 

0 0 

## [5,] 

1 

1 

5 

0 0 

## [6,] 

1 

1 

6 

0 0 


y3 y4 y5 y6 y7 y8 y9 ylO 
0000000 0 
0000000 0 
0000000 0 
0000000 0 
0000000 0 
0000000 0 


The first column of the data matrix is the id code of the subject, whereas the covariate 
gender (second column) is coded as 1 for male and 2 for female, the column named time 
‘‘http://discover.ukdataservice.ac.uk/catalogue/?sn=3935 
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is referred to the age bands, and the last ten columns are related to the binary response 
variables. In order to prepare the data, we run function long2wide, included in the package, 
which is aimed at transforming these data from the long to the wide format. This function 
produces, among others, the array YY of response configurations, the array XX of covariate 
configurations, and the vector freq of the corresponding frequencies. 

The R function aimed at estimating the class of mixed LM models is estjrrunixed, 
which requires the following arguments as input: 

• S: array of response configurations (n x T x r); 

• yv: vector of frequencies of the configurations; 

• kl: number of support points, corresponding to latent classes, of the distribution of 
the latent variable U ; 

• k2: number of support points, corresponding to latent states, of latent variables 
denoting the latent process V ; 

• start: equal to 0 for deterministic starting values (default value) and to 1 for random 
starting values; 

• tol: tolerance level for convergence. The default value is 10e-8; 

• maxit: maximum number of iterations before convergence. The default value is 

1000 ; 

• out_se: equal to TRUE to calculate the information matrix and the standard errors. 
FALSE is the default option. 

For the data at hand, the design matrix for the responses, S (corresponding to YY), 
is given by 915 different response configurations, T = 6 age bands, and r = 10 response 
variables. Similarly, for the covariate gender, the matrix XX is given by 915 configurations 
and T = 6 age bands. In the following we show two different response and covariate 
configurations and the corresponding frequency in the sample (given by the vector freq): 
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S = YY 
S [148 , ,] 


## [, 1 ] [, 2 ] [ 
## [ 1 ,] 0 0 

## [ 2 ,] 0 0 

## [3,] 0 0 

## [4,] 0 0 

## [5,] 0 0 

## [ 6 ,] 1 0 

XX [148,] 

## [ 1 ] 2 2 2 2 2 2 
freq [148] 

## [1] 3 


3] [,4] [, 5] [, 6] 
0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 


[,7] [,8] [,9] [,10] 
0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 


S [149 , , ] 


## [, 1 ] 
## [ 1 ,] 0 

## [ 2 ,] 0 

## [3,] 0 

## [4,] 0 

## [ 5 ,] 0 

## [ 6 ,] 0 


[, 2] [,3] [,4] [, 5] 
0 0 0 1 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 0 


[,6] [,7] [,8] 
0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 


[, 9 ] [, 10 ] 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 


XX [149,] 


## [ 1 ] 2 2 2 2 2 2 


freq[149] 
## [1] 113 


From the above configurations, we observe that only 3 females have been convicted for 
violence against the person (first response variable) in the last age band, which is from 
36 to 40 years old, whereas 113 females committed theft (fifth variable) during the first 
time window, related to age 10-15. 
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To illustrate the use of the function est_lm_mixed, we fit the model in Section [5] on 
such data with k\ — 2 latent classes, k 2 = 2 latent states, and we restrict the analysis to 
females. We use the following command in R: 

S = S [XX [, 1] ==2, ,] ; 
freq = freq[XX [, 1] ==2] 


res_mix2f = est_lm_mixed(S, yv = freq, kl = 2, k2 = 2, tol = 10~-8) 

Then, we get the following convergence info: 

print (res_mix2f) 

## Call: 

## est_lm_mixed(S = YY, yv = freq, kl = 2, k2 = 2, tol = 10~-8) 

## 

## Convergence info: 

## LogLik np BIC 

## [1,] -18347 27 36925 

The summary command shows, among others, the estimated mass probabilities X u , 
and the estimated initial and transition probabilities, n v \ u and n v \ U v' 

summary (res_mix2f) 

## Call: 

## est_lm_mixed(S = YY, yv = freq, kl = 2, k2 = 2, tol = 1CT-8) 

## 

## Coefficients: 

## 

## Mass probabilities: 

## [1] 0.2175 0.7825 
## 

## Initial probabilities: 

## u 

## v 1 2 


## 1 1.000e+00 0.90867 
## 2 1.644e-22 0.09133 


33 


## 

## Transition probabilities: 

## , , u = 1 
## 

## vl 

## vO 1 2 

## 1 0.8525 0.1475 

## 2 0.6414 0.3586 

## 

## , , u = 2 
## 

## vl 

## vO 1 2 

## 1 1.0000 2.455e-15 

## 2 0.3382 6.618e-01 

## 

## 

## Conditional response probabilities: 


## 

J 

J 

j = 1 


## 





## 


V 


## 

y 


1 

2 

## 


0 

0.995161 

0.8242 

## 


1 

0.004839 

0.1758 

## 





## 


3 

j = 2 


## 





## 


V 


## 

y 


1 

2 

## 


0 

0.998296 

0.9809 

## 


1 

0.001704 

0.0191 

## 





## 


3 

j = 3 


## 





## 


V 


## 

y 


1 

2 

## 


0 

0.996297 

0.7436 

## 


1 

0.003703 

0.2564 

## 





## 

5 

3 

j = 4 
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## 




## 


V 

## 

y 


1 2 

## 


0 

9.999e-01 0.9737 

## 


1 

9.691e-05 0.0263 

## 




## 

> 

J 

IX) 

II 

•1—) 

## 




## 


V 

## 

y 


1 2 

## 


0 

0.97726 0.4546 

## 


1 

0.02274 0.5454 

## 




## 


> 

j = 6 

## 




## 


V 

## 

y 


1 2 

## 


0 

0.998152 0.8892 

## 


1 

0.001848 0.1108 

## 




## 

> 

J 

j = 7 

## 




## 


V 

## 

y 


1 2 

## 


0 

0.995658 0.8177 

## 


1 

0.004342 0.1823 

## 




## 

> 

> 

00 

ii 

■' i 

## 




## 


V 

## 

y 


1 2 

## 


0 

0.997612 0.91046 

## 


1 

0.002388 0.08954 

## 




## 

> 

5 

C__i. 

II 

CO 

## 




## 


V 

## 

y 


1 2 

## 


0 

9.999e-01 0.98149 

## 


1 

6.439e-05 0.01851 
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## 

## , , j = 10 
## 

## v 

## y 12 

## 0 0.998664 0.7912 

## 1 0.001336 0.2088 

The estimated conditional probability of committing each type of crime, <pji\ v , may be 
also displayed as in the following 

round (res_mix2f$Psi [2, ,],digits=3) 

## j 

## v 123456789 10 

## 1 0.005 0.002 0.004 0.000 0.023 0.002 0.004 0.002 0.000 0.001 

## 2 0.176 0.019 0.256 0.026 0.545 0.111 0.182 0.090 0.019 0.209 

According to these estimated probabilities, we can identify the first latent state as 
that related to those females with null or very low tendency to commit crimes, whereas 
the second latent state corresponds to those criminals having mainly as type of activity 
theft, burglary, and some other types of offense. 

The model formulation allows us to characterize the two clusters of individuals at 
the beginning of the period of observation and to follow their evolution over time. In 
particular, the first cluster, which includes around 22% of females, is characterized by 
subjects that, at the beginning of the period of observation, are all in the first latent 
state (corresponding to null tendency to commit a crime). On the other hand, females 
classified in the second cluster (78%) are characterized by an initial probability of being in 
the second latent state of around 0.09. By comparing the transition probability matrices, 
we observe, within each cluster, a very high level of persistence in the first latent state. 
Moreover, females classified in the first cluster present a higher probability (of around 
0.64) to move from the second to the first state than those assigned to the second cluster 
(0.34), revealing a more pronounced tendency to improve in their behavior. 
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6 Conclusions 


We propose and illustrate the R package LMest to deal with latent Markov (LM) models. 


or a comp rehensive overview abou 


201 3) and 


Bartolucci et ah 


these models we refer the reader to 


Bartolucci et al. 


fl2014bn . The package allows us to efficiently fit LM mod¬ 


els for categorical longitudinal data. Both the manifest and latent distributions can be 
parameterized so as to include the effect individual covariates. The mixed formulation 
also allows us to include additional latent variables in these parameterizations. It shall 
be noted that all functions above can be used with multivariate categorical outcomes, 
with the only exception of est_lm_cov_manifest which is restricted to univariate categorical 
outcomes. 

Overall, we consider this package as a relevant advance for applied research in longi¬ 
tudinal data analysis in the presence of categorical response variables. In particular, we 
recall that in this context LM models are particularly useful at least from three different 
perspectives: (i) to represent and study the evolution of an individual characteristic (e.g., 
quality of life) that is not directly observable, (ii) to account for unobserved heterogeneity 
due to omitted covariates in a time-varying fashion, and (in) to account for measurement 
errors in observing a sequence of categorical response variables. We recall that, when 
covariates are available, they are typically included in the measurement model for appli¬ 
cations of type (ii), so that the response variables are affected by observed covariates and 
latent variables on the same footing, whereas the covariates are included in the latent 
models for applications of type («) and (in), so that they affect the distribution of the 
latent process. 

Further updates of the package will include the possibility to use multivariate out¬ 
comes in function est_lm_cov_manifest and new functions with different formulations of 


mixed LM models, also for sample units collected in clusters (IBartohicci et al, 20 1 lj) 


Additionally, the pack age w ill all ow us to work with continuous outcomes (also modeling 


quantiles of these as in 


Far c omeniJ . 


Bartolucci and Farcomeni, 


2012J), and to deal with informative missing values (e.g., 


2014a). We also plan to include estimation methods which are 


alternative to pure maximum likelihood estimation, as the three-step method proposed 
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by 


Bartolucci et al. 


(2015). 
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